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Abstract 

Network or graph structures are ubiquitous in the study of complex 
systems. Often, we are interested in complexity trends of these system 
as it evolves under some dynamic. An example might be looking at the 
complexity of a food web as species enter an ecosystem via migration or 
speciation, and leave via extinction. 

In a previous paper, a complexity measure of networks was proposed 
based on the complexity is information content paradigm. To apply this 
paradigm to any object, one must fix two things: a representation lan- 
guage, in which strings of symbols from some alphabet describe, or stand 
for the objects being considered; and a means of determining when two 
such descriptions refer to the same object. With these two things set, the 
information content of an object can be computed in principle from the 
number of equivalent descriptions describing a particular object. 

The previously proposed representation language had the deficiency 
that the fully connected and empty networks were the most complex for 
a given number of nodes. A variation of this measure, called zcomplexity, 
applied a compression algorithm to the resulting bitstring representation, 
to solve this problem. Unfortunately, zcomplexity proved too computa- 
tionally expensive to be practical. 

In this paper, I propose a new representation language that encodes 
the number of links along with the number of nodes and a representation 
of the linklist. This, like zcomplexity, exhibits minimal complexity for 
fully connected and empty networks, but is as tractable as the original 
measure. 

This measure is extended to directed and weighted links, and several 
real world networks have their network complexities compared with ran- 
domly generated model networks with matched node and link counts, and 
matched link weight distributions. Compared with the random networks, 
the real world networks have significantly higher complexity, as do artifi- 
cially generated food webs created via an evolutionary process, in several 
well known ALife models. 
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1 Introduction 



This work situates itself firmly within the complexity is information content 
paradigm, a topic that dates back to the 1960s with the work of Kolmogorov, 
Chaitin and Solomonoff. Prokopenko et al. provide a recent review of 

the area, and argue for the importance of information theory to the notion 
of complexity measure. In |32| . I argue that information content provides an 
overarching complexity measure that connects the many and various complex- 
ity measures proposed (see |14j for a review) up to that time. In some ways, 
information-based complexity measures are priveleged, motivating the present 
work to develop a satisfactory information-based complexity measure of net- 
works. 

The idea is fairly simple. In most cases, there is an obvious prefix-free 
representation language within which descriptions of the objects of interest can 
be encoded. There is also a classifier of descriptions that can determine if two 
descriptions correspond to the same object. This classifier is commonly called 
the observer, denoted 0{x). 

To compute the complexity of some object x, count the number of equiva- 
lent descriptions u>(£, x) of length I that map to the object x under the agreed 
classifier. Then the complexity of x is given in the limit as t — > oo: 

C(x) = lim (ilogN -logu(£,x)) (1) 

i— >oo 

where N is the size of the alphabet used for the representation language. 

Because the representation language is prefix-free, every description y in 
that language has a unique prefix of length s(y). The classifier does not care 
what symbols appear after this unique prefix. Hence uj(£, 0{y)) > N l ~ s ( y > . As £ 
increases, ui must increase as fast, if not faster than N l , and do so monotonically. 
Therefore C(0(y)) decreases monotonically with I, but is bounded below by 0. 
So equation (fTJ) converges. 

The relationship of this algorithmic complexity measure to more famil- 
iar measures such as Kolmogorov (KCS) complexity, is given by the coding 
theorem[2"2l Thm 4.3.3]. In this case, the descriptions are halting programs of 
some given universal Turing machine U, which is also the classifier. Equation (fTJ) 
then corresponds to the logarithm of the universal a priori probability, which 
is a kind of formalised Occam's razor that gives higher weight to simpler (in 
the KCS sense) computable theories for generating priors in Bayesian reason- 
ing. The difference between this version of C and KCS complexity is bounded 
by a constant independent of the complexity of x, so these measures become 
equivalent in the limit as message size goes to infinity. 

Many measures of network properties have been proposed, starting with 
node count and connectivity (no. of links), and passing in no particular order 
through cyclomatic number (no. of independent loops), spanning height (or 
width), no. of spanning trees, distribution of links per node and so on. Graphs 
tend to be classified using these measures — small world graphs tend to have 
small spanning height relative to the number of nodes and scale free networks 
exhibit a power law distribution of node link count. 
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Some of these measures are related to graph complexity, for example node 
count and connectivity can be argued to be lower and upper bounds of the 
network complexity respectively. More recent attempts include offdiagonal 
complexity 9., which was compared with an earlier version of this proposal in 
[36] , and medium articulation^^ [TH] . 

However, none of the proposed measures gives a theoretically satisfactory 
complexity measure, which in any case is context dependent (ie dependent on 
the observer O, and the representation language). 

In setting the classifier function, we assume that only the graph's topology 
counts — positions, and labels of nodes and links are not considered important. 
Links may be directed or undirected. We consider the important extension of 
weighted links in a subsequent section. 

A plausible variant classifier problem might occur when the network de- 
scribes a dynamical system (such as interaction matrix of the EcoLab model, 
which will be introduced later). In such a case, one might be interested in clas- 
sifying the network according to numbers and types of dynamical attractors, 
which will be strongly influenced by the cyclomatic count of the core network, 
but only weakly influenced by the periphery. However, this problem lies beyond 
the scope of this paper. 

The issue of representation language, however is far more problematic. In 
some cases, eg with genetic regulatory networks, there may be a clear repre- 
sentation language, but for many cases there is no uniquely identifiable lan- 
guage. However, the invariance theorem \22\ Thm 2.1.1] states that the differ- 
ence in complexity determined by two different Turing complete representation 
languages (each of which is determined by a universal Turing machine) is at 
most a constant, independent of the objects being measured. Thus, in some 
sense it does not matter what representation one picks — one is free to pick a 
representation that is convenient, however one must take care with non- Turing 
complete representations. 

In the next section, I will present a concrete graph description language that 
can be represented as binary strings, and is amenable to analysis. The quantity 
co in eq (JJ) can be simply computed from the size of the automorphism group, 
for which computationally feasible algorithms exist |23j. 

The notion of complexity presented in this paper naturally marries with 
thermodynamic entropy ff[21j: 

'S'max = C + S (2) 

where S'max is called potential entropy, ie the largest possible value that entropy 
can assume under the specified conditions. The interest here is that a dynamical 
process updating network links can be viewed as a dissipative system, with 
links being made and broken corresponding to a thermodynamic flux. It would 
be interesting to see if such processes behave according the maximum entropy 
production principle[10 or the minimum entropy production principle |27j. 

In artificial life, the issue of complexity trends in evolution is extremely 
important [5] ■ I have explored the complexity of individual Tierran organisms [331 
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[55] . which, if anything, shows a trend to simpler organisms. However, it is en- 
tirely plausible that complexity growth takes place in the network of ecological 
interactions between individuals. For example, in the evolution of the eukary- 
otic cell, mitochondria are simpler entities than the free-living bacteria they 
were supposedly descended from. A computationally feasible measure of net- 
work complexity is an important prerequisite for further studies of evolutionary 
complexity trends. 

In the results section, several well-known food web datasets are analysed, and 
compared with randomly-shuffled "neutral models" . An intriguing "complexity 
surplus" is observed, which is also observed in the food webs of several different 
ALifc systems that have undergone evolution. 

2 Representation Language 

One very simple implementation language for undirected graphs is to label the 
nodes l..n, and the links by the pair i < j of nodes that the links connect. 
The linklist can be represented simply by an L — n(n — l)/2 length bitstring, 
where the — 1) + ith position is 1 if link is present, and otherwise. 

The directed case requires doubling the size of the linklist, ie or L — n(n—l). 
We also need to prepend the string with the value of N in order to make it prefix- 
free — the simplest approach is to interpret the number of leading Is as the 
number n, which adds a term n + 1 to the measured complexity. 

This proposal was analysed in [36], and has the unsatisfactory property 
that the fully connected or empty networks are maximally complex for a given 
node count. An alternative scheme is to also include the link count as part of 
the prefix, and to use binary coding for both the node and link counts. The 
sequence will start with |~log 2 n\ l's, followed by a zero stop bit, so the prefix 
will be 2|~log 2 n \ + |~log 2 L~] + 1 bits. 

This scheme entails that some of bitstrings are not valid networks, namely 
ones where the link count does not match the number of Is in the linklist. We 
can, however, use rank encoding[25] of the linklist to represent the link pattern. 
The number of possible linklists corresponding to a given node/link specification 
is given by 



This will have a minimum value of 1 at I = (empty network) and I — L, the 
fully connected network. 

Finally, we need to compute u of the linklist, which is just the total number 
of possible renumberings of the nodes (n!), divided by the size of the graph 
automorphism group, which can be practically computed by Nauty[33], or a new 
algorithm I developed called SuperNOVA 38. which exhibits better performance 
on sparsely linked networks. 

A network A that has a link wherever B doesn't, and vice-versa might be 
called a complement of B. A bitstring for A can be found by inverting the Is and 
Os in the linklist part of the network description. Obviously, uj(A, L) — uj(B, L), 
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Figure 1 : The new complexity measure as a function of link count for all net- 
works with 8 nodes. This shows the strong dependence of complexity on link 
count, and the symmetry between networks and their complements. 



so the complexity of a network is equal to that of its complement, as can be 
seen in Figure [T] 

A connection can be drawn from this complexity measure, to that proposed 
in [11] . In their proposal, nodes are labelled, so the network is uniquely specified 
by the node and link counts, along with a rank-encoded linklist. Also, nodes 



may link to themselves, so L 
be 



Consequently, their complexity measure will 



C = 2[log 2 n] + riog 2 Ll + 1 



log 2 



(L-l)\l\ 



(4) 



This is, to within a term of order log 2 n corresponding to the lead-in prefix 
specifying the node count, equal to the unnumbered formula given on page 326 

of m. 



3 New complexity measure compared with the 
previous proposals 

Figures [2] and [3] [36, Fig. 1] shows zcomplcxity plotted against the new complex- 
ity and the original complexity proposal for all networks of order 8 respectively. 
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Figure 2: C z plotted against C for all networks of order 8. The diagonal line 
corresponds to C = C z + 3. 
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Figure 3: C z plotted against the original C from [36] for all networks of order 8, 
reproduced from that paper. 
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The new complexity proposal is quite well correlated with zcomplexity, and is in 
this example about 3 bits higher than zcomplexity. This is just the difference in 
prefixes between the two schemes (the original proposal used 8 1 bits followed by 
a stop bit = 9 bits overall to represent the node count, and the new scheme uses 

3 1 bits and a stop bit to represent the field width of the node count, 3 bits for 
the node field and 5 bits for the link count field — 12 bits overall). Data points 
appearing above the line correspond to networks whose linkfields compress bet- 
ter with the new algorithm than the scheme used with zcomplexity. Conversely, 
data points below the line are better compressed with the old algorithm. 

We can conclude that the new scheme usually compresses the link list field 
better than the run length encoding scheme employed in |36j , and so is a better 
measure of complexity than zcomplexity, as well as being far more tractable. 
The slightly more complex prefix of the new scheme grows logarithmically with 
node count, so will ultimately be more compressed than the prefix of the old 
scheme, which grows linearly. 

4 Comparison with medium articulation 

In the last few years Wilhelm[221 HH] introduced a new complexity like measure 
that addresses the intuition that complexity should be minimal for the empty 
and full networks, and peak for intermediate values (like figure [TJ. It is obtained 
by multiplying an information quantity that increases with link count by a 
different information that falls. The resulting measure is therefore in square 
bits, so one should perhaps consider the square root as the complexity measure. 
Precisely, medium articulation is given by 

MA = - V Wij log = ^ x V w tj log = ^ , (5) 

where Wij is the normalised weight w ij = -0 °f ^ ne nn ^ f rom node i to 
node j. 

Figure H] shows medium articulation plotted against C for a sample of 1000 
Erdos-Renyi networks up to order 500. There is no clear relationship between 
medium articulation and complexity for the average network. Medium articu- 
lation does not appear to discriminate between complex networks, however if 
we restrict our attention to simple networks (Figures [5] and [6J medium articu- 
lation is strongly correlated with complexity, and so could be used as a proxy 
for complexity for these cases. 

5 Weighted links 

Whilst the information contained in link weights might be significant in some 
circumstances (for instance the weights of a neural network can only be varied 
in a limited range without changing the overall qualitative behaviour of the net- 
work), of particular theoretical interest is to consider the weights as continuous 
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Figure 4: Medium Articulation plotted against complexity for 1000 randomly 
sampled Erdos-Rcnyi graphs up to order 500. 
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Figure 5: Medium Articulation plotted against complexity for 1000 randomly 
sampled Erdos-Rcnyi graphs up to order 500 with no more than 2n links. 
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Figure 6: Medium Articulation plotted against complexity for 1000 randomly 
sampled Erdos-Renyi graphs up to order 500 with more than n(n — 5)/2 links. 



parameters connecting one network structure with another. For instance if a 
network X has the same network structure as the unweighted graph A, with b 
links of weight 1 describing the graph B and the remaining a — b links of weight 
w, then we would like the network complexity of X to vary smoothly between 
that of A and B as w varies from 1 to 0. [16] introduced a similar measure. 

The most obvious way of defining this continuous complexity measure is to 
start with normalised weights w i = !• Then arrange the links in weight 
order, and compute the complexity of networks with just those links of weights 
less than w. The final complexity value is obtained by integrating: 

C(X = N x L) = f C(N x{ieL:wi< w})dw (6) 
Jo 

Obviously, since the integrand is a stepped function, this is computed in practice 
by a sum of complexities of partial networks. 

6 Comparing network complexity with the Erdos- 
Renyi random model 

I applied the weighted network complexity measure (|6|) to several well-known 
real network datasets, obtained from Mark Newman's website [4T| |20"1 [26] . the 
Pajek website [3 [391 SOI H3 12 [U [3] and Duncan Watt's website [Ml [H] , with the 
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Figure 7: The distribution of complexities of the shuffled Narragansett Bay food 
web [24]. Both a normal and a log- normal distribution have been fitted to the 
data — the log-normal is a slightly better fit than the normal distribution. In 
the bottom right hand corner is marked the complexity value computed for the 
actual Narragansett food web (58.2 bits). 



results shown in Table [TJ The number of nodes and links of the networks varied 
greatly, so the raw complexity values are not particularly meaningful, as the 
computed value is highly dependent on these two network parameters. What is 
needed is some neutral model for each network to compare the results to. 

At first, one might want to compare the values to an Erdos-Renyi random 
network with the same number of nodes and links. However, in practice, the 
real network complexities are much less than that of an ER network with the 
same number of nodes and links. This is because in our scheme, a network with 
weighted link weights looks somewhat like a simpler network formed by removing 
some of the weakest links from the original. An obvious neutral network model 
with weighted network links has the weights drawn from a normal distribution 
with mean 0. The sign of the weight can be interpreted as the link direction. 
Because the weights in equation ([6]) are normalised, the complexity value is 
independent of the standard deviation of the normal distribution. However, 
such networks are still much more complex than the real networks, as the link 
weight distribution doesn't match that found in the real network. 

Instead, a simple way of generating a neutral model is to break and reattach 
the network links to random nodes, without merging links, leaving the original 
link weights unaltered. In the following experiment, we generate 1000 of these 
shuffled networks. The distribution of complexities can be fitted to a lognormal 
distribution, which gives a better likelihood than a normal distribution for all 
networks studied here [5], although the difference between a log-normal and a 
normal fit becomes less pronounced for more complex networks. Figure [7] shows 
the distribution of complexity values computed by shuffling the Narragansett 
food web, and the best fit normal and lognormal distributions. 

In what follows we compute the average (IiiCer) and standard deviation 
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o'er of the logarithm of the neutral model complexities. We can then compare 
the network complexity with the ensemble of neutral models to obtain a p- 
value, the probability that the observed complexity is that of a random network. 
The p- values are so small that is better to represent the value as a number of 
standard deviations ( "sigmas" ) that the logarithm of the measured complexity 
value is from the mean logarithm of the shuffled networks. A value of 6 sigmas 
corresponds to a p- value less than 1.3 x 10~ 4 , although this must be taken with 
a certain grain of salt, as the distribution of shuffled complexity values has a 
fatter tail than the fitted log-normal distribution. In none of these samples 
did the shuffled network complexity exceed the original network's complexity, 
meaning the p- value is less than 10~ 3 . The difference C — exp((lnCErt)) is the 
amount of information contained in the specific arrangement of links. 

A code implementing this algorithm is implemented as a CH — h library, and 
is available from version 4.D36 onwards as part of the ^ c tab system, an open 
source modelling framework hosted at http: / /ecolab. sourceforge.net! 



7 ALife models 
7.1 Tierra 

Tierra 29 is a well known artificial life system in which self reproducing com- 
puter programs written in an assembly-like language are allowed to evolve. The 
programs, or digital organisms can interact with each other via template match- 
ing operations, modelled loosely on the way proteins interact in real biological 
systems. A number of distinct strategies evolve, including parasitism, where 
organisms make use of another organism's code and hyper-parasitism where an 
organism sets traps for parasites in order to steal their CPU resources. At any 
point in time in a Tierra run, there is an interaction network between the species 
present, which is the closest thing in the Tierra world to a foodweb. 

Tierra is an aging platform, with the last release (v6.02) having been released 
more than six years ago. For this work, I used an even older release (5.0), for 
which I have had some experience in working with. Tierra was originally written 
in C for an environment where ints were 16 bits and long ints 32 bits. This posed 
a problem for using it on the current generation of 64 bit computers, where the 
word sizes are doubled. Some effort was needed to get the code 64 bit clean. 
Secondly a means of extracting the interaction network was needed. Whilst 
Tierra provided the concept of "watch bits" , which recorded whether a digital 
organism had accessed another's genome or vice versa, it did not record which 
other genome was accessed. So I modified the template matching code to log 
the pair of genome labels that performed the template match to a file. 

Having a record of interactions by genotype label, it is necessary to map 
the genotype to phenotype. In Tierra, the phenotype is the behaviour of the 
digital organism, and can be judged by running the organisms pairwise in a 
tournament, to see what effect each has on the other. The precise details for 
how this can be done is described in [53] . 
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Having a record of interactions between phenotypes, and discarding self-self 
interactions, there are a number of ways of turning that record into a foodweb. 
The simplest way, which I adopted, was sum the interactions between each pair 
of phenotypes over a sliding window of 20 million executed instructions, and 
doing this every 100 million executed instructions. This lead to time series of 
around 1000 foodwebs for each Tierra run. 

In Tierra, parsimony pressure is controlled by the parameter SlicePow. CPU 
time is allocated proportional to genome size raised to SlicePow. If SlicePow is 
close to 0, then there is great evolutionary pressure for the organisms to get as 
small as possible to increase their replication rate. When it is one, this pressure 
is eliminated. In [35], I found that a SlicePow of around 0.95 was optimal. 
If it were much higher, the organisms grow so large and so rapidly that they 
eventually occupy more than 50% of the soup. At which point they kill the soup 
at their next Mai (memory allocation) operation. In this work, I altered the 
implementation of Mai to fail if the request was more than than the soup size 
divided by minimum population save threshold (usually around 10). Organisms 
any larger than this will never appear in the Genebanker (Tierra's equivalent of 
the fossil record), as their population can never exceed the save threshold. This 
modification allows SlicePow = 1 runs to run for an extensive period of time 
without the soup dying. 



EcoLab was introduced by the author as a simple model of an evolving ecosystem 
[31 j - The ecological dynamics is described by an n-dimensional generalised 
Lotka-Volterra equation: 



where rij is the population density of species i, its growth rate and the 
interaction matrix. Extinction is handled via a novel stochastic truncation al- 
gorithm, rather than the more usual threshold method. Speciation occurs by 
randomly mutating the ecological parameters (r, and /%) of the parents, subject 
to the constraint that the system remain bounded 30 . 

The interaction matrix is a candidate foodweb, but has too much informa- 
tion. Its offdiagonal terms may be negative as well as positive, whereas for 
the complexity definition ([6]), we need the link weights to be positive. There 
are a number of ways of resolving this issue, such as ignoring the sign of the 
off-diagonal term (ie taking its absolute value), or antisymmetrising the ma- 
trix by subtracting its transpose, then using the sign of the offdiagonal term to 
determine the link direction. 

For the purposes of this study, I chose to subtract just the negative 
terms from themselves and their transpose terms /3,-j. This effects a maximal 
encoding of the interaction matrix information in the network structure, with 
link direction and weight encoding the direction and size of resource flow. The 
effect is as follows: 



7.2 EcoLab 




(7) 



j 
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• Both Pij and f3ji are positive (the mutualist case). Neither offdiagonal 
term changes, and the two nodes have links pointing in both directions, 
with weights given by the two offdiagonal terms. 

• Both fiij and j3ji are negative (the competitive case). The terms are 
swapped, and the signs changed to be positive. Again the two nodes 
have links pointing in both directions, but the link direction reflects the 
direction of resource flow. 

• Both fiij and /3ji are of opposite sign (the predator-prey or parasitic case) . 
Only a single link exists between species i and j, whose weight is the 
summed absolute values of the offdiagonal terms, and whose link direction 
reflects the direction of resource flow. 

7.3 Webworld 

Webworld is another evolving ecology model, similar in some respects to Eco- 
Lab, introduced by [5], with some modifications described in jT^]. It features 
more realistic ecological interactions than does EcoLab, in that it tracks biomass 
resources. It too has an interaction matrix called a functional response in that 
model that could serve as a foodweb, which is converted to a directed weighted 
graph in the same way as the EcoLab interaction matrix. I used the Web- 
world implementation distributed with the ^ c tab simulation platform [32] ■ The 
parameters were chosen as R = 10 5 , b — 5 x 10~ 3 , c = 0.8 and A = 0.1. 

8 Methods and materials 

Tierra was run on a 512KB soup, with SlicePow set to 1, until the soup died, 
typically after some 5 x 10 10 instructions have executed. Some variant runs were 
performed with SlicePow=0.95, and with different random number generators, 
but no difference in the outcome was observed. 

The source code of Tierra 5.0 was modified in a few places, as described in the 
Tierra section of this paper. The final source code is available as tierra. 5. 0.D7. tar. gz 
from the ^- c( t.ab website hosted on SourceForge ( http:/ /ecolab.sf.net[ ). 

The genebanker output was processed by the eco-tierra.3.D13 code, also 
available from the E- C tab website, to produce a list of phenotype equivalents for 
each genotype. A function for processing the interaction log file generated by 
Tierra and producing a timeseries of foodweb graphs was added to Eco-ticrra. 
The script for running this postprocessing step is process_ecollog.tcl. 

The EcoLab model was adapted to convert the interaction matrix into a 
foodweb and log the foodweb to disk every 1000 time steps for later processing. 
The Webworld model was adapted similarly. The model parameters were as 
documented in the included ecolab.tcl and webworld. tel experiment files of the 
ecolab.5.Dl distribution, which is also available from the ^ c tab website. 

Finally, each foodweb, whether real world, or ALife generated, and 100 link- 
shuffled control versions were run through the network complexity algorithm 
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Figure 8: Complexity of the Tierran interaction network for SlicePow=0.95, and 
the associated complexity surplus. The surplus was typically 15-30 times the 
standard deviation of the ER neutral network complexities. 



([H]). This is documented in the cmpERmodel.tcl script of ecolab.5.Dl. The 
average and standard deviation of InC was calculated, rather than C directly, 
as the shuffled complexity values fitted a log-normal distribution better than a 
standard normal distribution. The difference between the measured complexity 
and exp(lnC) (ie the geometric mean of the control network complexities) is 
what is reported as the surplus in Figures I8TITU1 



9 ALife results 

Figures IMTOl show the computed complexity values and the surplus of the com- 
plexity over the average randomly shuffled networks. In both the EcoLab and 
Tierra cases, significant amounts of surplus complexity can be observed, but 
not in the case of Web World. In Web World's case, this is probably because the 
system complexity was not very high in the first place — the diversity typically 
ranged between 5 and 10 species throughout the WebWorld run. It is possible 
that the competition parameter c was too high for this run to generate any 
appreciable foodweb complexity. 

An earlier report of these results [37] did not show significant complexity, 
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Figure 9: Complexity of EcoLab's foodweb, and the associated complexity sur- 
plus. For most of the run, the surplus was 10-40 times the standard deviation of 
the ER neutral network complexities, but in the final complexity growth phase, 
it was several 100. 
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Figure 10: Complexity of Webworld's foodweb, and the associated complexity 
surplus. The surplus was typically 1-5 times the standard deviation of the ER 
neutral network complexities. 
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unlike what is shown here. It was discovered that the process of writing the food- 
web data to an intermediate file stripped the weight data from the network links, 
preserving only the graph's topological structure. It turns out that the com- 
plexity surplus phenomenon is only present in weighted networks — unweighted 
networks tend to be indistinguishable from randomly generated networks. 

10 Discussion 

In this paper, a modified version of a previously proposed simple representation 
language for n-node directed graphs is given. This modification leads to a 
more intuitive complexity measure that gives a minimum value for empty and 
full networks. When compared with the zcomplexity measure introduced in the 
previous paper, it is somewhat correlated with, and is at least as good a measure 
as zcomplexity, but has the advantage of being far more tractable. 

When compared with random networks created by shuffling the links, all real 
world example networks exhibited higher complexity than the random network. 
The difference between the complexity of the real network and the mean of the 
complexities of the randomly shuffled network represents structural information 
contained in the specific arrangement of links making up the network. To test 
the hypothesis that networks generated by evolutionary selection will also have 
a complexity surplus representing the functional nature of the network, I ran the 
same analysis on foodwebs generated by various artificial life systems. These 
too, exhibited the same complexity surplus seen in the real world networks. 

By contrast, I have applied this technique to some scale free networks gen- 
erated by the Barabasi- Albert preferential attachment process^, with varying 
numbers of attachment points per node and different weight distributions. On 
none of these graphs were significant complexity surpluses generated. 

If the weights are stripped off the edges, then the effect disappears, which 
provides a clue as to the origin of the effect. In equation ©, the components 
dominating the sum will tend to be connected giant components of the overall 
network with very few cycles. The randomly shuffled versions of these, however, 
will tend to be composed of small disconnected components, with many of the 
same motifs represented over and over again. Consequently, the shuffled network 
has a great deal of symmetry, reducing the overall complexity. Even though 
naively one might think that random networks should be the most complex 
within the class of networks of a given node and link count, given the nature 
of Kolmogorov complexity to favour random strings, the weighted complexity 
formula © is sensitive to the structure within the network that has meaning to 
system behaviour that the network represents. 

Reverting to an earlier posed question about the amount of complexity stored 
within the foodweb as compared with individual organism's phenotypic com- 
plexity, the results in Figure [5] gives a preliminary answer. Each foodweb in 
Figure [5] averaged around 100 species, the average phenotypic complexity of 
which is around 150-200 bits (750 bits at maximum) [33] . So the 250-400 bits 
of network complexity is but a drop in the bucket of phenotypic complexity. 
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Table 1: Complexity values of several freely available network datasets. cel- 
egansneural, lesmis and adjnoun are available from Mark Newman's website, 
representing the neural network of the C. elegans nematode |41j. the coappear- 
ance of characters in the novel Les Miserables by Victor Hugoj20] and the adja- 
cency network of common adjectives and nouns in the novel David Copperfield 
by Charles Dickens [26!. The metabolic data of C. efeoans[T3"] and protein inter- 
action network in yeast [TS] are available from Duncan Watt's website. PA1 and 
PA3 are networks generated via preferential attachment with in degree of one or 
three respectively, and uniformly distributed link weights. The other datasets 
are food webs available from the Pajek website [2 EH |40l El [TJ [3] . For each 
network, the number of nodes and links are given, along with the computed 
complexity C. In the fourth column, the original network is shuffled 1000 times, 
and the logarithm of the complexity is averaged ((uiCer))- The fifth column 
gives the difference between these two values, which represents the information 
content of the specific arrangement of links. The final column gives a measure 
of the significance of this difference in terms of the number of standard devia- 
tions ("sigmas") of the distribution of shuffled networks. In two examples, the 
distributions of shuffled networks had zero standard deviation, so oo appears in 
this column. 



